深度强化学习理论及其应用综述

doi:10.16451/j.cnki.issn1003-6059.201901009

摘要
图/表
参考文献
相关文章 (2)

全文: PDF (1230 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要一方面,随着深度强化学习理论和应用研究不断深入,其在游戏、机器人控制、对话系统、自动驾驶等领域发挥重要作用;另一方面,深度强化学习受到探索-利用困境、奖励稀疏、样本采集困难、稳定性较差等问题的限制,存在很多不足. 面对这些问题,研究者们提出各种各样的解决方法,新的理论进一步推动深度强化学习的发展,在弥补缺陷的同时扩展强化学习的研究领域,延伸出模仿学习、分层强化学习、元学习等新的研究方向. 文中从深度强化学习的理论、困难、应用及发展前景等方面对其进行探讨.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	万里鹏
	兰旭光
	张翰博
	郑南宁

关键词 ：深度强化学习, 马尔科夫决策过程, 探索-利用困境, 稀疏奖励

Abstract：Deep reinforcement learning(DRL) theory and applied research are deepening and it is now playing an important role in games, robot control, dialogue systems, automatic driving, etc. Meanwhile, due to shortcomings such as exploration-exploitation dilemma, sparse reward, sample collection hardness, poor model stability, DRL still has many problems for which researchers have proposed various solutions. New theories has further promoted the development of DRL, and opened up several new research fields of reinforcement learning, such as imitative learning, hierarchical reinforcement learning and meta-learning. This paper aims to explore and summarize future development of DRL, and a brief introduction of DRL theory, difficulties and applications is presented at the same time.

收稿日期: 2018-12-29

ZTFLH:

TP 181

基金资助:国家自然科学基金重点项目(No.91748208)、国家自然科学基金面上项目(No.61573268)、国家科技部重点研发计划项目(No.2018ZX01028101)、陕西省重点研发计划项目(No.2018ZDCXLGY0607)、微软亚洲研究院合作项目(No.01051311120002601)资助

通讯作者: 兰旭光,博士,教授,主要研究方向为计算机视觉、机器学习.E-mail:xglan@mail.xjtu.edu.cn.

作者简介: 万里鹏,博士研究生,主要研究方向为深度强化学习、共融机器人.E-mail:xjtuwanlip@126.com.张翰博,博士研究生,主要研究方向为深度强化学习、机器人控制.E-mail:zhanghanbo163@stu.xjtu.edu.cn.郑南宁,博士,教授,主要研究方向为计算机视觉、模式识别.E-mail:nnzheng@mail.xjtu.edu.cn.

引用本文:

万里鹏, 兰旭光, 张翰博, 郑南宁,. 深度强化学习理论及其应用综述[J]. 模式识别与人工智能, 2019, 32(1): 67-81. WAN Lipeng, LAN Xuguang, ZHANG Hanbo, ZHENG Nanning. A Review of Deep Reinforcement Learning Theory and Application. , 2019, 32(1): 67-81.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201901009 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2019/V32/I1/67